ar X iv : 1 71 0 . 11 24 1 v 1 [ cs . L G ] 3 0 O ct 2 01 7 Theoretical properties of the global optimizer of two layer neural network

نویسنده

  • Guanghui Lan
چکیده

In this paper, we study the problem of optimizing a two-layer artificial neural network that best fits a training dataset. We look at this problem in the setting where the number of parameters is greater than the number of sampled points. We show that for a wide class of differentiable activation functions (this class involves “almost” all functions which are not piecewise linear), we have that first-order optimal solutions satisfy global optimality provided the hidden layer is non-singular. Our results are easily extended to hidden layers given by a flat matrix from that of a square matrix. Results are applicable even if network has more than one hidden layer provided all hidden layers satisfy non-singularity, all activations are from the given “good” class of differentiable functions and optimization is only with respect to the last hidden layer. We also study the smoothness properties of the objective function and show that it is actually Lipschitz smooth, i.e., its gradients do not change sharply. We use smoothness properties to guarantee asymptotic convergence of O(1/number of iterations) to a first-order optimal solution. We also show that our algorithm will maintain non-singularity of hidden layer for any finite number of iterations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ar X iv : 1 71 0 . 05 14 0 v 1 [ cs . C C ] 1 4 O ct 2 01 7 On complexity of multidistance graph recognition in R 1 Mikhail Tikhomirov

Let A be a set of positive numbers. A graph G is called an Aembeddable graph in R if the vertices of G can be positioned in R so that the distance between endpoints of any edge is an element of A. We consider the computational problem of recognizing A-embeddable graphs in R1 and classify all finite sets A by complexity of this problem in several natural variations.

متن کامل

ar X iv : 0 71 0 . 24 70 v 1 [ he p - la t ] 1 2 O ct 2 00 7 Lattice Approach to Light Scalars

I report on lattice QCD calculations that study the properties of the a0 and f0 mesons.

متن کامل

Aggregating Algorithm for Prediction of Packs

This paper formulates the protocol for prediction of packs, which a special case of prediction under delayed feedback. Under this protocol, the learner must make a few predictions without seeing the outcomes and then the outcomes are revealed. We develop the theory of prediction with expert advice for packs. By applying Vovk’s Aggregating Algorithm to this problem we obtain a number of algorith...

متن کامل

ar X iv : g r - qc / 0 11 01 18 v 1 2 7 O ct 2 00 1 Constancy of the Constants of Nature 1

The current observational and experimental limits on time variation of the constants of Nature are briefly reviewed.

متن کامل

ar X iv : 0 71 0 . 35 19 v 1 [ cs . C C ] 1 8 O ct 2 00 7 P - matrix recognition is co - NP - complete

This is a summary of the proof by G.E. Coxson [1] that P-matrix recognition is co-NP-complete. The result follows by a reduction from the MAX CUT problem using results of S. Poljak and J. Rohn [5].

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017